The search functionality is under construction.

Author Search Result

[Author] Hideharu AMANO(66hit)

61-66hit(66hit)

  • A Routing Algorithm for Multihop WDM Ring

    Xiaoshe DONG  Tomohiro KUDOH  Hideharu AMANO  

     
    PAPER-Computer Networks

      Vol:
    E82-D No:2
      Page(s):
    422-430

    Divisor-Skip Wavelength Division Multiplexing (DS-WDM) ring is an optical interconnection network for workstation clusters or parallel machines which can connect various number of nodes easily using wavelength division multiplexing techniques. However, the wavelength-ordered routing algorithm proposed for the DS-WDM ring requires complicated processes in each router. Here, a new routing algorithm called the comparing dimensional number routing algorithm for the DS-WDM ring is proposed and evaluated. Although the diameter and average distance are almost same as traditional wavelength-ordered routing, the cost and latency are much reduced.

  • Fault Tolerance of the TBSF (Tandem Banyan Switching Fabrics) and PBSF (Piled Banyan Switching Fabrics)

    Akira FUNAHASHI  Toshihiro HANAWA  Hideharu AMANO  

     
    PAPER-Fault Diagnosis/Tolerance

      Vol:
    E79-D No:8
      Page(s):
    1180-1189

    Multistage Interconnection Networks (MIN) with multiple outlets are networks which can support higher bandwidth than those of nonblocking networks by passing multiple packets to the same destination. Fault recovery mechanisms are proposed for two of such networks (TBSF/PBSF) with the best use of their inherent fault tolerant capability. With these mechanisms, on-the-fly fault recovery is possible for multiple faults on switching elements. For the link fault, the networks are reconfigured after fault diagnosis, and the network is available with some performance degradation. The bandwidth degradation under multiple faults on link/element is analyzed with both theoretical models and simulation. Through the analysis, F-PBSF shows high fault tolerance under high traffic load and low reliability by using 3 or more banyan networks.

  • Reconfigurable Out-of-Order System for Fluid Dynamics Computation Using Unstructured Mesh

    Takayuki AKAMINE  Mohamad Sofian ABU TALIP  Yasunori OSANA  Naoyuki FUJITA  Hideharu AMANO  

     
    PAPER-Computer System

      Vol:
    E97-D No:5
      Page(s):
    1225-1234

    Computational fluid dynamics (CFD) is an important tool for designing aircraft components. FaSTAR (Fast Aerodynamics Routines) is one of the most recent CFD packages and has various subroutines. However, its irregular and complicated data structure makes it difficult to execute FaSTAR on parallel machines due to memory access problem. The use of a reconfigurable platform based on field programmable gate arrays (FPGAs) is a promising approach to accelerating memory-bottlenecked applications like FaSTAR. However, even with hardware execution, a large number of pipeline stalls can occur due to read-after-write (RAW) data hazards. Moreover, it is difficult to predict when such stalls will occur because of the unstructured mesh used in FaSTAR. To eliminate this problem, we developed an out-of-order mechanism for permuting the data order so as to prevent RAW hazards. It uses an execution monitor and a wait buffer. The former identifies the state of the computation units, and the latter temporarily stores data to be processed in the computation units. This out-of-order mechanism can be applied to various types of computations with data dependency by changing the number of execution monitors and wait buffers in accordance with the equations used in the target computation. An out-of-order system can be reconfigured by automatic changing of the parameters. Application of the proposed mechanism to five subroutines in FaSTAR showed that its use reduces the number of stalls to less than 1% compared to without the mechanism. In-order execution was speeded up 2.6-fold and software execution was speeded up 2.9-fold using an Intel Core 2 Duo processor with a reasonable amount of overhead.

  • The Implementation of a Hybrid Router and Dynamic Switching Algorithm on a Multi-FPGA System

    Tomoki SHIMIZU  Kohei ITO  Kensuke IIZUKA  Kazuei HIRONAKA  Hideharu AMANO  

     
    PAPER

      Pubricized:
    2022/06/30
      Vol:
    E105-D No:12
      Page(s):
    2008-2018

    The multi-FPGA system known as, the Flow-in-Cloud (FiC) system, is composed of mid-range FPGAs that are directly interconnected by high-speed serial links. FiC is currently being developed as a server for multi-access edge computing (MEC), which is one of the core technologies of 5G. Because the applications of MEC are sometimes timing-critical, a static time division multiplexing (STDM) network has been used on FiC. However, the STDM network exhibits the disadvantage of decreasing link utilization, especially under light traffic. To solve this problem, we propose a hybrid router that combines packet switching for low-priority communication and STDM for high-priority communication. In our hybrid network, the packet switching uses slots that are unused by the STDM; therefore, best-effort communication by packet switching and QoS guarantee communication by the STDM can be used simultaneously. Furthermore, to improve each link utilization under a low network traffic load, we propose a dynamic communication switching algorithm. In our algorithm, each router monitors the network load metrics, and according to the metrics, timing-critical tasks select the STDM according to the metrics only when congestion occurs. This can achieve both QoS guarantee and efficient utilization of each link with a small resource overhead. In our evaluation, the dynamic algorithm was up to 24.6% faster on the execution time with a high network load compared to the packet switching on a real multi-FPGA system with 24 boards.

  • A Mapping Method for Multi-Process Execution on Dynamically Reconfigurable Processors

    Vu MANH TUAN  Hideharu AMANO  

     
    PAPER-Computer Systems

      Vol:
    E91-D No:9
      Page(s):
    2312-2322

    The multi-process execution in dynamically reconfigurable processors is a technique to enhance throughput by trying to exploit more inherent parallelism of applications. Basically, a total process for an application is divided into small processes, assigned into limited areas of a reconfigurable array, and concurrently executed in a pipelined manner. In order to improve the efficiency of the multi-process execution, a systematic method for mapping processes onto a reconfigurable array consisting of multiple hardware execution units is essential. This paper proposes and investigates a systematic method for mapping an application modeled as a Kahn Process Network onto a dynamically reconfigurable processing array. In order to execute streaming applications in a pipelined manner, the size of Tiles, which is a unit area of dynamically reconfigurable array, and the grouping of processes are adjusted. Using real applications such as DCT, JPEG encoder and Turbo encoder, the impact of different versions mapped onto the NEC Dynamically Reconfigurable Processor on performance is evaluated. Evaluation results show that our proposed mapping algorithm achieves the best performance in terms of the throughput and the execution time.

  • Partial Reconfiguration of Flux Limiter Functions in MUSCL Scheme Using FPGA

    Mohamad Sofian ABU TALIP  Takayuki AKAMINE  Yasunori OSANA  Naoyuki FUJITA  Hideharu AMANO  

     
    PAPER-Computer System

      Vol:
    E95-D No:10
      Page(s):
    2369-2376

    Computational Fluid Dynamics (CFD) is used as a common design tool in the aerospace industry. UPACS, a package for CFD, is convenient for users, since a customized simulator can be built just by selecting desired functions. The problem is its computation speed, which is difficult to enhance by using the clusters due to its complex memory access patterns. As an economical solution, accelerators using FPGAs are hopeful candidate. However, the total scale of UPACS is too large to be implemented on small numbers of FPGAs. For cost efficient implementation, partial reconfiguration which dynamically loads only required functions is proposed in this paper. Here, the MUSCL scheme, which is used frequently in UPACS, is selected as a target. Partial reconfiguration is applied to the flux limiter functions (FLF) in MUSCL. Four FLFs are implemented for Turbulence MUSCL (TMUSCL) and eight FLFs are for Convection MUSCL (CMUSCL). All FLFs are developed independently and separated from the top MUSCL module. At start-up, only required FLFs are selected and deployed in the system without interfering the other modules. This implementation has successfully reduced the resource utilization by 44% to 63%. Total power consumption also reduced by 33%. Configuration speed is improved by 34-times faster as compared to full reconfiguration method. All implemented functions achieved at least 17 times speed-up performance compared with the software implementation.

61-66hit(66hit)